toddler-inspired visual object learning
- North America > United States > Virginia (0.04)
- North America > United States > Indiana (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Toddler-Inspired Visual Object Learning
Real-world learning systems have practical limitations on the quality and quantity of the training datasets that they can collect and consider. How should a system go about choosing a subset of the possible training examples that still allows for learning accurate, generalizable models? To help address this question, we draw inspiration from a highly efficient practical learning system: the human child. Using head-mounted cameras, eye gaze trackers, and a model of foveated vision, we collected first-person (egocentric) images that represents a highly accurate approximation of the training data that toddlers' visual systems collect in everyday, naturalistic learning contexts. We used state-of-the-art computer vision learning models (convolutional neural networks) to help characterize the structure of these data, and found that child data produce significantly better object models than egocentric data experienced by adults in exactly the same environment. By using the CNNs as a modeling tool to investigate the properties of the child data that may enable this rapid learning, we found that child data exhibit a unique combination of quality and diversity, with not only many similar large, high-quality object views but also a greater number and diversity of rare views. This novel methodology of analyzing the visual training data used by children may not only reveal insights to improve machine learning, but also may suggest new experimental tools to better understand infant learning in developmental psychology.
Reviews: Toddler-Inspired Visual Object Learning
The goal of the paper is to "data mine" records of toddlers' and their mothers' fixations while playing with a set of 24 toys in order to observe what might be good training data for a deep network, given a fixed training budget. The idea is that the toddler is the best visual learning system we know, and so the data that toddlers learn from should give us a clue about what data is appropriate for deep learning. They take fixation records extracted from toddlers (16-24 mo old) and their mothers collected via scene cameras and eye tracking to examine the data distribution of infants' visual input or mothers' visual input. This study clearly falls under the cognitive science umbrella at NIPS, although they try to make it about deep learning. For example, if they only cared about deep learning, they would not use a retinal filter. First, they manually collect data recording what toys the infants and mothers are fixating on (ignoring other fixations).
Toddler-Inspired Visual Object Learning
Bambach, Sven, Crandall, David, Smith, Linda, Yu, Chen
Real-world learning systems have practical limitations on the quality and quantity of the training datasets that they can collect and consider. How should a system go about choosing a subset of the possible training examples that still allows for learning accurate, generalizable models? To help address this question, we draw inspiration from a highly efficient practical learning system: the human child. Using head-mounted cameras, eye gaze trackers, and a model of foveated vision, we collected first-person (egocentric) images that represents a highly accurate approximation of the "training data" that toddlers' visual systems collect in everyday, naturalistic learning contexts. We used state-of-the-art computer vision learning models (convolutional neural networks) to help characterize the structure of these data, and found that child data produce significantly better object models than egocentric data experienced by adults in exactly the same environment.